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1, INTRODUCTION 

Research on flight anomaly detection has begun in the last decade, such as using a regression model 
with the data set used from FOQA (Flight Operations Quality Assurance) with 26 parameters [1], anomalous 
detection on FDR flight data was presented in [2] in which density-based clustering (DBSCAN) technique 
was used to cluster the entire FDR data on one flight thus could not detect flight anomaly early. 

Our interest in this research topic stems from the phenomenon of aviation problems such as the case 
of QZ8501 Surabaya (SUB)-Singapore (SIN) Air Asia plane crash in December 2014. Based on the 
investigation results of the National Transportation Safety Commission in 2015, that there were anomalies on 
the flight route (there was significant deviation from the position of latitude and longitude coordinates that 
the flight should be [3]. Figure 1 shows anomalies on latitude and longitude of this flight. The dashed line is 
the trajectory that the plane should have passed through, while the red line is the actual trajectory that the 
plane passed through, resulting in a crash. The problem in this phenomenon 1s that the detection of anomalies 
were non-preventive. 

Based on the above flight problems and previous research studies, we present real-time flight 
anomaly detection using distance-based clustering technique (K-Means). The data source used as training 
data is data from the Automatic Dependent Surveillance Broadcasting (ADS-B) obtained for thirty days. 
The obtained ADS-B data are grouped into two 3-dimensional features, is a feature P1 = (latitude, longitude, 
speed) and feature P2 = (latitude, longitude, travel time). The clustering process to obtain anomaly detection 
is based on the pattern of flight habits in a call sign. The process continues on determining cluster 
performance based on internal validity tests based-on silhouette index. The contribution of this research is the 
determination of optimum partition on 3-Dimensional ADS-B features. The optimum size is the minimum 
percentage of identification error (False Identification Rate) in each 3-dimensional feature. 
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Figure |. Flight route anomaly on AirAsia QZ8501 (SUB-SIN) 


Research related to anomaly detection can be categorized into one dimensional analysis and multi- 
dimensional analysis. In one dimension analysis, statistical approach is used based on the deviation value, 
such as [4] in which three methods (Interquartile Range, Moving Average, and Median Absolut Deviation) 
are used in detecting anomaly in IP address that access the network. 

Multi-dimensional analysis can be done by clustering method. In [5], automatic initial number of 
center k was proposed for K-Means to form clusters on big data by using map reduce paradigm. In [6], one of 
cluster quality measurement methods (1.e., index silhouette) was used to measure the quality of document 
group which have financial risks. Silhouette index is used because of its simplicity to measure how well a 
document is placed in a cluster. In [7], euclidean distance is used to measure the similarity of distance 
between training data and testing data in two classes. The result is that the percentage of accuracy testing data 
based on Euclidean values is found in a class. 

Flight anomaly detection using clustering techniques has been the focus of several papers [2] used 
DBSCAN to group FOQA (Flight Operation Quality Assurance) flight data. The results obtained were the 
data that were detected as anomaly based on a certain density. There are several anomaly criteria in flight, 
including: high and low energy states, unusual pitch excursions, abnormal flap settings, high wind conditions. 
The flight phase specified is taking off and landing phase. The proposed method, are: 1) the conversion of 
time-series data set into vectors. 2) Reduction of data dimension using Principal Component Analysis method 
in aviation phase. The takeoff phase from 6188 to 77, while in the landing phase from 6279 to 95. 3) cluster- 
based density analysis (DBSCAN) is used to obtain the outlier area in each flight phase. The study was 
developed in 2015 to detect abnormal flight [8], so 1t could assist experts in handling anomalous conditions in 
operational aviation. The difference from previous research is that this study did not require of flight anomaly 
criteria. The proposed method is ClusterAD-Flight. It is based on the use of data mining techniques in 
observing and analyzing common patterns. It 1s possible that all flights have a common pattern. If there are 
deviant conditions, it 1s possible to experience anomaly and can be used as a reference in aviation safety 
management. 

Based on the above research, it can be concluded that there are several important aspects that need 
to be developed. The development is time-based anomaly detection. Analysis is evaluative and events then 
the data obtained from the process 1s analyzed. Resulted in the early detection process of flight anomalies 
was not found. In addition, the cluster-based method of density did not become the main approach 
to clustering. 

It is because there is still a minimum point determination (MinPts). Therefore, the distance-based 
cluster approach can be an alternative solution. An anomaly detection approach in real-time can be developed 
in this research, as a solution in the field of aviation to develop anomaly aviation of early warning 
detection system. 

Related research on flight clustering techniques is fault / anomaly detection [9] by knowing the 
abnormal circumstances that occur [8]. The technique used was to perform parameter reduction before data 
was clustered [2]. Furthermore, the data was clustered according to density [8], [10], so that areas outside the 
density were potentially anomaly. Below in Figure 2 shows anomaly-based detection stages based on density. 
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Figure 2. Anomaly Detection Stage with density-based clustering technique (DBSCAN) 


The stages taking place are: first, the process of flight data transformation (FDR) patterned time 
series into multidimensional vector. Second, parameter reduction was done to obtain the main parameters 
contained in the multidimensional vector by Principle Component Analysis (PCA) technique. The last stage, 
it performed anomaly detection based on DBSCAN clustering technique. The result obtained was an area 
with a certain minimum density of potentially anomalous. 

Problems that occurred in density-based clustering techniques were that there are minPT parameters 
(minimum points) in the data set. It needed further analysis of the minPT provisions. The following 
illustration of density-based cluster technique (DBSCAN) is shown in Figure 3. 
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Figure 3. Illustration of DBSCAN cluster technique 


It was shown that with two clusters with the number of members in each cluster were: cluster to-1 
was 13 and cluster to-2 was 6. In addition, the minimum member of each cluster (minPT) was 5 and value 
¢ = |. Based on this cluster technique, it was obtained anomaly data with the amount of 1. 

In addition, cluster analysis through the measurement of the validity / performance of the cluster 
was not done. So, it was not found the indicator to know the clusters well formed. In contrast to our research, 
the cluster technique used was a distance-based cluster technique K-Means [11]. After the cluster was 
formed, the measurement of cluster validity was based on internal validity. The internal method of validity 
with the maximum index value used is called the silhouette index [12]. 

The correlation with previous research that the flight data used had a time series pattern [2]. So, the 
flight data recording (FDR) parameters [10] received per record take place during the timestamp period. 
The FDR parameters include altitude, speed, latitude, longitude, and time. For our research, FDR is based on 
Automatic Dependent Surveillance Broadcasting (ADS-B) [13] [14]. One of the advantages of ADS-B is 
accessibility, that data access uses the http protocol. So it facilitates the ATC as a secondary FDR (in addition 
to radar data). 
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2. PROPOSED METHOD 

There are several studies related to anomaly detection in flight such as anomaly detection by using 
clustering method [2], [8]. Furthermore, the thing to note in aviation anomaly detection is the determination 
of waypoint [15] which becomes the specific location of the aircraft accurately flying past several points 
before reaching the destination. Furthermore, between one Waypoint and the other waypoint formed an area 
called a segment [16]. The provisions of the segment are: waypoint! < segment] < waypoint2; waypoint2 < 
segment2 < waypoint3; until waypoint7 < segment7 < waypoint’. For initial segment determination starting 
from segment2, because the ADS-B data used is when the plane's position is in cruise [1], so segment] is 
unavailable. 

Furthermore, the data set in the segment is done K-Means clustering process [17]-[20]. To get a 
good cluster results, it performed cluster analysis to get the value of the index validity. There are two criteria 
in the process of measuring validity, are: compactness (cohesion) and separation [21]. In the cluster 
evaluation technique in order to obtain the optimum cluster, an internal validity evaluation [22] 1s used. 
There are two value categories of validity measurement index, the maximum index value [19], [23]-[25] and 
the minimum index value [21], [23], [26]. 

The next stage, anomaly-based clustering detection. After the cluster on the segment is generated, 
it measures the distance between centroid one and the other centroid to detect potentially anomalous areas. 
Measurement distance used Euclidean distance [20]. The result, in the form of the largest centroid distance 
from other centroids is as a potential cluster / anomaly area. 


2.1. Research Framework 

The framework in this study is shown in Figure 4 so the real-time determination process of the 
segment-based anomaly is happened. The focus is on determining the detection area by determining the 
optimum partition based on the distance-based cluster. 
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Figure 4. Framework research 
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The proposed method in this research is Building Segment, Partition, Grouping, and Validity 
Measurement (BsPGV). The target 1s anomaly detection model based on 3D data feature partition. First, the 
data source (data training) ADS-B is specified on several segments based on the waypoint point. Second, 
partition the data in a segment. Third, grouping with K-Means clustering method. Fourth, cluster validity 
measurements. In determining the optimum partition, it is done through testing data tested on BSPGV model. 
The result is an identification error (False Identification Rate/FIR). 


2.2. Data set 


The parameters contained in the ADS-B data are: timestamp, UTC, call sign, position, altitude, 
speed, and direction. The following ADS-B data for call sign LNI860 / SUB-PLW are shown in Table 1. 
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Table 1. ADS-B Data Set for LNI860 Flight 


O 1319 


Timestamp Date Time ee Latitude Longitude Altitude ewe) 
(seconds) (v) 
1480582105 12/01/2016 = 15:48:25 0 -6.601 115.329 37000 456 
1480582099 =12/01/2016 ~—- 15:48:19 6 -6.609 115.318 37000 456 
1480582063 =12/01/2016 ~—:15:47:43 36 -6.643 115.249 37000 456 
1480582002 12/01/2016 = 15:46:42 61 -6.695 115.131 37000 462 
1480581939 =12/01/2016 ~—s:15:45:39 63 -6.749 115.005 37000 466 


Based on the ADS-B data source in the Table 1, 3-Dimensions (3D) feature was proposed in this 
study. The first feature (P1) 1s parameters: latitude, longitude, and speed. While the second feature (P2) with 
parameters: latitude, longitude, and traveling time. 


2.3. Waypoint 
This research uses Surabaya (SUB) flight route to Palu (PLW) (call sign LNI860). The route passes 
through eight waypoints represented by latitude and longitude coordinates. The waypoints coordinates are 
indicated through the following latitude and longitude coordinates. 


a) 
b) 
C) 
d) 
€) 
f) 


g) 
h) 


Waypointl: 
Waypoint2: 
Waypoint3: 
Waypoint4: 
Waypoints: 
Waypoint6: 
Waypoint7: 
Waypoints: 


Surabaya with coordinates latitude, longitude is (-7.373, 112.772); 
Fando with coordinates latitude, longitude 1s (-6.973, 113.985); 
Kasol with coordinates latitude, longitude 1s (-6.568, 115.173); 
Dasty with coordinates latitude, longitude is (-6.173, 116.330); 
Endog with coordinates latitude, longitude is (-5.877, 117.202); 
Gamal with coordinates latitude, longitude is (-2.863, 118.038); 
Rudal with coordinates latitude, longitude is (-2.662, 118.711); 
Palu with coordinates latitude, longitude is (-0.885, 119.962). 


The following is shown in Figure 5, the distribution of waypoints in the latitude and longitude 
coordinate representations. 


2.4. Segment 
A segment is an area between a single waypoint and another waypoint. The formulation is as 
follows: Wn < SEGMENTn < Wn + 1 (n= I, 2, k), so the range of area obtained is shown in Table 2. 
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Figure 5. Distribution of waypoints for LNI860 flight 


Table 2. Segments in LNI860 Flight 


Segment Range Area Waypoint 

SEGMEN, Waypoint; > Waypoint, 
SEGMEN)> Waypoint, > Waypoint; 
SEGMEN; Waypoint; > Waypoint, 
SEGMEN, Waypoints > Waypoints 
SEGMEN; Waypoints > Waypoints 
SEGMENg Waypoints > Waypoint, 
SEGMEN;, Waypoint; > Waypoints 
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The following in Figure 6 shows the distribution of ADS-B training data for 30 days on each 
segment formed and waypoints. 
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Figure 6. Distribution of ADS-B training data while 30 days for LNI860 flight 


2.5. Partitions (sub segments) 

Partitions (sub segment) is the process of dividing the area on each segment. To go to the clustering 
done partition for the nature of data on a cluster can be known specifically. Partitions are determined by the 
following 3: 


n/2,n/3,n/4,n/5 (3) 


Partitioning process takes place in two features: 3D feature with position parameters, speed 
(latitude, longitude, speed) and 3D feature with position parameters, travel time (latitude, longitude, traveling 
time). 


2.6. False Identification Rate (FIR) 

False Identification Rate (FIR) is an indicator of error rate measurement in the identification process 
of data testing located in the nearest cluster. Figure 7 shows the FIR scenario that takes place in data testing 
with multiple clusters. 
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Figure 7. Scenario 1 FIR (distance measurement in data testing) 
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The first scenario, data testing measured the distance with the centroid of each cluster. Performed 
measurements between centroid clusters with some data testing. Data testing used comes from ten days of 
training data (in-set testing). Minimum size of distance generated, then allows the data testing is in the cluster 
distance acquisition minimum. In Figure 8 shows the process of checking the testing data on the nearest 
cluster, along with the determination of FIR value. 


DataTesting = Cluster;; 
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Figure 8. Scenario2 FIR (checking data testing on a cluster) 


The second scenario, checking the number of possible data testing 1s located on the cluster that the 
minimum centroid distance. FIR = 0, if the data is in the cluster and FIR = 1, if the data is not in the cluster. 
For the application of the computational FIR is represented in the following pseudocode algorithm. 


Algorithm-Compute FIR 
Input: dataTesting 


i //distance measurement 

2 distC;(test,CentC,); 

3: distC>(test,CentC>); 

4: distC3(test,CentC3); 

5: minDist <distC); 

6: dist |[distC,,distC>,distC;]; 

7: for i€1 to length(dist) do 

8: if dist(1) < minDist then 
9: minDist <dist(i); 
10: cluster i; 

11: endif 

12: endfor 

13: //compute FIR 

14: for j<1 to length(test) do 

is): if test) == cluster(j) then 
16: FIRG) <0; 

17: else 

18: FIRG) <1; 

19: endif 

20: endfor 


The False Identification Rate (FIR) algorithm is grouped into two processes. First, the calculation of 
the distance between the centroid and the data testing. Measurement distance with Euclidean distance on a 
function distC1l, distC2, and distC3 based on number of cluster centroid (k = 3). Minimum distance is 
obtained, indicating the data testing 1s likely to be on the cluster. There are two variables, minDist variable to 
store the minimum distance, and cluster variable that indicates a certain cluster. Second, the FIR calculation 
by checking the test data is in the selected cluster. FIR = 0 (variable FIR(Gj) € 0) if the test data is not in the 
selected cluster. While FIR =1 (variable FIR(j) <1) if the test data is in the selected cluster. 


3. RESEARCH METHOD 
3.1. Clustering 

The clustering method used is a distance-based cluster, called K-Means. This method is chosen, 
because the data grouped on one cluster is determined by distance (Euclidean distance) and produces centroid 
for each cluster. For former initialization of the centroid was done by a random value. In the implementation 
of clustering, a cluster per segment was performed. Furthermore, the cluster process took place on the 3 
Dimensional feature. The 3 Dimensions feature is the ADS-B parameters involving three parameters. 
The parameters are based on features Pl and P2. Pl = (latitude, longitude, speed) and P2 = (latitude, 
longitude, traveling time).The cluster result is formed in Figure 9 for cluster Pl, and Figure 10 for cluster P2 
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For Pl centroid cluster 1s Centroid] = (-5.779, 115.796, 422.570); Centroid2 = (-5.911, 115.748, 
459.472); Centroid3 = (-5.930, 115.767, 444.552). There is a large amount of data per cluster (nCluster), 
which are: nClusterl = 151; nCluster2 = 354; nCluster3 = 433. For P2, centroid cluster is Centroidl = (- 
5.871, 115.757, 73.57); Centroid2 = (-6.088, 115.498, 332); Centroid3 = (-5.927, 115.788, 11.26). There is a 
large amount of data per cluster (nCluster), which are: nCluster1] = 531; nCluster2 = 19; nCluster3 = 388. 

The clusters formed are represented on the x, y, z Cartesian axis determined by 3-dimensional 
features. These features are P1 (latitude, longitude, speed) and P2 (latitude, longitude, traveling time). Cluster 
results on feature P1 shown in Figure 5 1s: data in cluster to-1 marked by blue dots are 151. Data in cluster to- 
2 marked by orange dots are 354. For data in clusters to -3 marked by red dots are 433. Next to the centroid 
point at Pl in each cluster is the centroid point of the cluster 1: (-5.779, 115.796, 422.570). Centroid point in 
the second cluster 1s: (-5.911, 115.748, 459.472). Then the centroid point in the 3rd cluster is: (-5.930, 
115.767, 444.552). 

For cluster results in feature P2 shown in Figure 6 are: data in cluster to-1 marked by blue dots are 
531. Data in the 2nd cluster marked by orange dots are 19. For data in the cluster to-3 marked by red dots are 
388. Next to the point of centroid on P2 in each cluster is the centroid point clustered to-1 is: (-5.871, 
115.757, 73.57). Centroid point in the 2nd cluster is: (-6.088, 115.498, 332). Then the centroid point in the 
3rd cluster 1s: (-5.927, 115.788, 11.26). 


3.2. Cluster Validation with Silhouette index 

To get a good cluster results, it is necessary to measure the cluster validity. This stage is called 
cluster analysis by conducting the cluster truth test process formed. Silhouette index is a measure of how 
similar an object to its cluster (cohesion) is compared to other clusters (separation). Range of silhouette 
values ranges is from -1 to +1. If it shows the maximum value, then the object is on its cluster and not in 
other clusters. If many objects have a high value, then the clustering configuration is complete. If multiple 
points have low or negative values, then the clustering configuration may be too many or too few clusters. 

Silhouette can be calculated by using distance matrix, such as Euclidean Distance, as well as 
formulated in mathematics as follows. 1 and 2 shows the Silhouette-index calculation formula. 


i) - P= ali 


~ max{a(i), b@} (1) 


i at rate. 
s(t) = 40, if ai)=b@) 
2O 1 if a) > bi) 
a(t) (2) 
Where, 

a) a(i): the average distance from 1 to all points on one cluster; 
b) c(i): the average distance from i to all points on the other cluster (cluster neighbors); 
c) b@): minimum c(i). 
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4. RESULTS AND DISCUSSION 

The results achieved in this study is formed cluster on each partition that was determined based on 
3D features. Furthermore, computational analysis on each cluster is formed based on the number of 
iterations, replication, total distance (total sum of distance), and travel time. Cluster analysis process was 
done as optimal size of cluster. This is done through the internal process of cluster validity by the method of 
calculating the value of the silhouette index. The specified range ranges from -1 to +1 or -1 <sh< + 1. 
The last stage is the testing process. The data testing used comes from training data (in-set training data) with 
10 days sample. Furthermore, with the error identification method or False Identification Rate (FIR), then the 
minimum% error obtained on each partition of each 3D feature as a determinant of an established partition is 
optimal. Here are the results of experiments and analysis conducted, then obtained anomaly detection model 
through 3D features P1 and P2. 


4.1. Features 3-Dimension 

In the 3-Dimensional feature that proceeds, two models are generated through the features Pl = 
(latitude, longitude, speed) and P2 = (latitude, speed, traveling time). Overall the process in 3D, features P1 
and P2 is a cluster-based anomaly detection model. In the model that formed, there are three things that 
determine, namely: First, there is a potential anomaly area based on the furthest distance and the smallest 
amount of data from each cluster generated. Second, computing aspect calculations based on parameters: 
replication, iteration, total distance, and travel time per partition. 


4.2. Features 3-Dimension P1 
The result of grouping with K = 3 as anomaly detection model on P1 is shown in Figure 11. 
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Figure 11(a). Clustering in partition I of five 


Figure 11(b). Clustering in partition II of five 
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Figure 11(c). Clustering in partition III of five 
partitions 


Figure 11(d). Clustering in partition IV of five 
partitions 
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Figure 11(e). Clustering in partition V of five partitions 


The computational analysis in each partition on feature Pl are shown in the following Table 3a and 
Table 3b. For internal measurement of cluster validity based on the silhouette index value on P1, it is shown 


in the silhouette chart in Figure 12. 


Table 3a. Computational Aspect in Each Partition on P1 (Two Partitions and Three Partitions) 
Two Partitions Three Partitions 


Computational Aspect 
Partition I II I II Ill 
Replication 1 1 1 1 1 
Iteration 4 2 1 4 2 
225 1026 1607. 979 777 


Total sum of Distance 


Table 3b. Computational Aspect in Each Partition on Pl (Four Partitions and Five Partitions) 
Five Partitions 
Il Ill IV V 


Computational Aspect Four Partitions 
Partition I II Il IV I 
Replication 1 1 1 1 1 1 1 1 1 
Iteration 2 10 i) 4 5 1 5 1 a 
1381 817 453 566 1091 608 562 304 566 


Total sum of Distance 
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Figure 12(a). Silhouette index from cluster in Figure 12(b). Silhouette index from cluster in 
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Figure 12(c). Silhouette index from cluster in Figure 12(d). Silhouette index from cluster in 
partition III of five partitions partition IV of five partitions 
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Figure 12(e). Silhouette index from cluster in partition V of five partitions 


The silhouette index in each partition can be shown in the following Table 4a and Table 4b. 


Table 4(a). Silhouette Index in Each Partitions on P1 (Two Partitions and Three Partitions) 
Three Partitions 


Two Partitions 
Partition I I I I Ill 
-0.001 0.178 0.123 0.127 -0.003 
Silhouette index <sh< <sh< <sh< <sh< <sh< 
0.998 1 0.997 0.99 1 
Information minus(-) - - - minus(-) 


Table 4(b). Silhouette Index in Each Partitions on Pl (Four Partitions and Five Partitions) 
Five Partitions 
5; 


Four Partitions 
Partition I II Ill IV I II Ill IV 
0.122 0.079 0.122 0.013 0.135 0.728 0.291 0.506 0.013 
Silhouette index <sh< <sh< <sh< <sh< <sh< <sh< <sh< <sh< <sh< 
0.997 0.99 1 0.99 0.99 0.99 0.99 1 0.99 


Information 
in Figure 13. 


Features 3-Dimension P, 
The clustering results with K = 3 as anomaly detection model on P2 is shown 
Computational anaysis on each partition on feature P2 are shown in the following Table 5a and Table 5b. 


Silhouette index in each partition can be shown in Table 6a and Table 6b. 


Table 5(a). Computational Aspect in Each Partition on P2 (Two Partitions and Three Partitions) 
Three Partitions 


Computational Aspect Two Partitions 
Partition I II I II III 
Replication 1 1 1 1 1 
Iteration 5 1 3 2 1 

82379 2262 7210 47547 1969 


Total sum of Distance 


Table 5(b). Computational Aspect in Each Partition on P2 (Four Partitions and Five Partitions) 
Five Partitions 


eOmpuuane Four Partitions 
Aspect 
Partition I II Ill IV I II Ill IV Vv 
Replication 1 1 1 1 1 1 1 1 1 
Iteration 2 pi 1 1 1 1 1 1 1 
5086 =39187 = =460 = =1756 = =611091_ = 1250) ~—— 4353 432 1756 


Total sum of Distance 


Table 6(a).Silhouette Index in Each Partitions on P2 (Two Partitions and Three Partitons) 
Three Partitions 


Two Partitions 
Partition I I i II Ill 
Silhouette index -0.058<sh<0.901 0.452<sh <1 0.293<sh<]1 0.143<sh<]1 0.413<sh<]1 
Information minus (-) - - - - 
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Table 6(b). Silhouette Index in Each Partitions on P2 (Four Partitions and Five Partitions) 
Four Partitions Five Partitions 
Partition I II Ill IV I II Ill IV V 


0.197 0.260 0594 0.339 0.235 0526 0.215 0.836 0.339 


Silhouette index <sh< <sh< <sh< <sh< <sh< <sh< <sh< <sh< <sh< 
1 1 0.997 1 1 0.986 0.983 1 1 
Information - - - - - - - - - 
ANOMALY DETECTION ANOMALY DETECTION 


THROUGH LAT, LONG, & TRAVELING TIME on PARTI! of 4 PARTITIONS THROUGH LAT, LONG, & TRAVELING TIME on PART Il of 4 PARTITIONS 


Traveling Time 
Traveling Time 
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a -100 . | : 
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Figure 13(b). Clustering in partition II of four 


Figure 13(a). Clustering in partition I of four 
partitions 


partitions 
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Figure 13(d). Clustering in partition IV of four 


Figure 13(c). Clustering in partition III of four 
partitions 


partitions 


For internal measurement of cluster validity based on the silhouette index value on P2, it is shown in 


the silhouette chart in Figure 14. 
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Figure 14(b). Silhouette index from cluster in 


Figure 14(a). Silhouette index from cluster in 
partition II of four partitions 


partition I of four partitions 
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Figure 14(c). Silhouette index from cluster in Figure 14(d). Silhouette index from cluster in 
partition II of four partitions partition IV of four partitions 


For the determination of traveling time in each feature P1 and P2 are grouped by partition. In the Pl 
and P2 features obtained the same traveling time. The following in Table 7a and Table 7b shows the travel 
time per partition in each feature. 


Table 7(a). Traveling Time Of P1 and P2 Per Partitions (Two Partitions and Three Partitions) 
Two Partitions Three Partitions 
Partitions I Ul I Ul Ul 


Traveling Time 
per Partitions 7.5 (min) 2.8(min) 3.9(min) 3.9(min) 1.7 (min) 


Table 7(b). Traveling Time of PI and P2 Per Partitions (Four Partitions and Five Partitions) 


Four Partitions Five Partitions 
Partitions I II III IV I II III IV V 
Tyavelina Timeiper Partitions 3.1 3.6 je 0.7 ao 1.7 1.9 1.1 0.7 
(min) (min) (min) (min) (min) (min) (min) (min) (min) 


4.3. False Identification Rate (FIR) 

The False Identification Rate (FIR) measurement is applied to the 3-Dimension P1 and P2 features. 
It is for each feature there are two processes. First, calculate the minimum distance between the test data with 
the centroid in each cluster. Second, checking the test data in each record for ten days, whether it is 
appropriate to be in the selected cluster. If not true then FIR= 1 and correct then FIR = 0. The FIR graphic 
representation as a representation of the size of the identification error in the trial process. On the graph 
increases if the error identification is great, otherwise if the graph decreases indicates that the error 
identification is getting smaller. The following shows the FIR graph as the result of the error identification of 
the experiments that occurred on the 3D features. 

For testing based on FIR percentage per partition for 3D features Pl is shown in the graph 
representation in Figure 15. For testing based on the percentage of FIR per partition for 3D P2 is shown in 
the graphical representation in Figure 16. Furthermore, the average percentage of FIR features 3D PI and P2 
per partition is shown in Figure 17. 


Percentage of False Identification Raten (FIR) 
per Partitions on P, 
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Figure 7. FIR graphics on each partition on 3D features P1 
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Figure 9. Graphics average Percentage of FIR P1 and P2 


Based on the percentage of FIR per partition on the Pl feature, the optimal partition is obtained in 
five partitions. The optimal partition generated based on the minimum FIR percentage value is 3%. While the 
optimum partition in feature P2 occurs in four partitions with FIR percentage is 18%. 

Based on FIR analysis for parameters Pl and P2, the determination of the optimum partition for 
detection of anomaly flight routes tends to be in parameter P1 (latitude, longitude, speed) compared to 
parameter P2 (latitude, longitude, traveling time). Therefore we need further analysis on P2 parameters, one 
of them is through optimum clustering by optimizing centroid initialization so that K-means clustering runs 
optimally. 


5. CONCLUSION 

In this paper, new methods are produced to determine the real-time aviation of anomaly detection 
based on ADS-B data with three-dimensional features. There are several steps: First, the determination of 
segment on the flight route is based on waypoint. Second, doing the partitioning process in each segment 
based on the partition, ie: n/2, n/3, n/4, and n/5. Third, the distance-based clustering with k = 3, so it obtains 
the point of centroid. Fourth, the distance measurement (Euclidean distance) is between the centroid one and 
the other centroid. The distant centroid indicates that the cluster is a potentially anomalous area. Fifth, testing 
an anomaly detection model. The indicator used in this research is to identify the False Identification Rate 
(FIR). The result for the 3D feature (Pl) of the optimal partition is in five partitions with the average 
percentage of FIR = 3%. While for 3D feature (P2) it is obtained optimal partition in four partitions with 
average percentage of FIR = 18%. 

The next work, it is possible to decrease the percentage of FIR P2 whose value is more than 10%. 
The reduction of FIR percentage is possible to conduct by using optimization at centroid initialization, so that 
the optimal cluster and FIR reach the minimum. In addition, the research development is done by real time 
anomaly detection approach. The time domain aspect of determining the detection area becomes the deciding 
factor. The use of clustering methods is to determine the optimum cluster results. Finally, for the anomaly 
detection process that enables large computing, a parallel computing process 1s required. 
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